Classifying and Clustering Dialects of North American English
نویسنده
چکیده
This paper presents the results of experiments in which machine learning techniques were applied to the problem of determining regional dialect boundaries. Specifically, decision trees classification and k-means clustering were applied to a corpus of phonetic measurements taken from a large survey of North American English vowels. Pairwise classification and clustering experiments were done for all combinations of ten dialect regions determined by dialectologists. The results show which of these dialect regions are most distinct and similar, suggesting which of the distinctions that are usually used by linguists are the most meaningful. Furthermore, the classification trees are analyzed to show which vowel formants are most informative for each dialect region.
منابع مشابه
Intrinsic vowel duration and the post-vocalic voicing effect: some evidence from dialects of north american English
We report the results of a comprehensive dialectal survey of three vowel duration phenomena in North American English: gross duration differences between dialects, the effect of postvocalic consonant voicing, and intrinsic vowel duration. Duration data, from HMM-based forced alignment of phones in the Atlas of North American English corpus [1], showed that 1) the post-vocalic voicing effect app...
متن کاملA comparison of acoustic and articulatory methods for analyzing vowel differences across dialects: Data from American and Australian English.
In studies of dialect variation, the articulatory nature of vowels is sometimes inferred from formant values using the following heuristic: F1 is inversely correlated with tongue height and F2 is inversely correlated with tongue backness. This study compared vowel formants and corresponding lingual articulation in two dialects of English, standard North American English, and Australian English....
متن کاملVowel perception by listeners from different English dialects
Native English listeners from North America rely primarily on changes in formants, not vowel duration, when perceiving the vowel contrast in the minimal pair bit and beat manipulated from a Canadian English sample [5]. In this paper, we evaluated which cue do native English listeners from other regions use when perceiving the same North American vowel contrast. For this purpose, we used the sam...
متن کاملPatterns of Assimilation Nasality in English as a Function of Vowel Height
Assimilation nasality patterns for high, mid and low vowels were studied in two dialects of North American English (Canadian & southeastern American). Native speakers (n=24) produced CVC, NVC, CVN and NVN tokens. The vowel portion of each oral and nasal acoustical signal was transduced by a Nasometer, digitized, and the degree of nasalance established as: % nasalance = nasal rms/(nasal + oral r...
متن کاملClassifying English Documents by National Dialect
We investigate national dialect identification, the task of classifying English documents according to their country of origin. We use corpora of known national origin as a proxy for national dialect. In order to identify general (as opposed to corpus-specific) characteristics of national dialects of English, we make use of a variety of corpora of different sources, with inter-corpus variation ...
متن کامل